Although augmentations (e.g., perturbation of graph edges, image crops) boost the efficiency of Contrastive Learning (CL), feature level augmentation is another plausible, complementary yet not well researched strategy. Thus, we present a novel spectral feature argumentation for contrastive learning on graphs (and images). To this end, for each data view, we estimate a low-rank approximation per feature map and subtract that approximation from the map to obtain its complement. This is achieved by the proposed herein incomplete power iteration, a non-standard power iteration regime which enjoys two valuable byproducts (under mere one or two iterations): (i) it partially balances spectrum of the feature map, and (ii) it injects the noise into rebalanced singular values of the feature map (spectral augmentation). For two views, we align these rebalanced feature maps as such an improved alignment step can focus more on less dominant singular values of matrices of both views, whereas the spectral augmentation does not affect the spectral angle alignment (singular vectors are not perturbed). We derive the analytical form for: (i) the incomplete power iteration to capture its spectrum-balancing effect, and (ii) the variance of singular values augmented implicitly by the noise. We also show that the spectral augmentation improves the generalization bound. Experiments on graph/image datasets show that our spectral feature augmentation outperforms baselines, and is complementary with other augmentation strategies and compatible with various contrastive losses.
translated by 谷歌翻译
图对比度学习(GCL)改善了图表的学习,从而导致SOTA在各种下游任务上。图扩大步骤是GCL的重要但几乎没有研究的步骤。在本文中,我们表明,通过图表增强获得的节点嵌入是高度偏差的,在某种程度上限制了从学习下游任务的学习区分特征的对比模型。隐藏功能(功能增强)。受到所谓矩阵草图的启发,我们提出了Costa,这是GCL的一种新颖的协变功能空间增强框架,该框架通过维护原始功能的``好草图''来生成增强功能。为了强调Costa的特征增强功能的优势,我们研究了一个保存记忆和计算的单视图设置(除了多视图ONE)。我们表明,与基于图的模型相比,带有Costa的功能增强功能可比较/更好。
translated by 谷歌翻译
在本文中,我们提出了几次学习管道,通过关节时间和相机视点对齐(Jeanie)提出了基于3D骨架的动作识别。要考虑查询和支持3D身体关节的支持序列之间的错位,我们提出了一种动态时间翘曲的先进变体,该动态时间翘曲是共同模拟查询和支持帧之间的每个平滑路径,同时实现时间和模拟摄像机视点空间中的最佳对准在有限的几次训练数据下的端到端学习。序列用基于简单的光谱图卷积的时间块编码器来编码,轻量级线性图形神经网络骨架(我们还包括具有变压器的设置)。最后,我们提出了一种基于相似性的损失,这鼓励相同阶级的序列对准,同时防止不相关序列的对准。我们在NTU-60,NTU-120,动力学 - 骨架和UWA3D多视图活动II上展示了最先进的结果。
translated by 谷歌翻译
在本文中,我们通过将歧管学习步骤纳入鉴别器来改善生成的对抗网络。我们考虑了基于位置约束的线性和子空间的歧管,以及地方约束的非线性歧管。在我们的设计中,歧管学习和编码步骤与鉴别器的层交织在一起,其目标是将中间特征表示吸引到歧管上。我们自适应地平衡特征表示和歧管视图之间的差异,这代表了在歧管上的去噪和精炼歧管之间的折衷。我们得出结论,由于它们的不均匀密度和平滑度,地区限制的非线性歧管具有上部的线性歧管。我们对不同最近最新的基线显示出实质性的改进。
translated by 谷歌翻译
当前的非刚性对象键点检测器在所选种类和身体部位上表现良好,需要大量标记的关键点进行培训。此外,它们针对特定身体部位定制的热手容不能识别在看不见的物种上识别新型关键点(未标记的关键点)。我们提出了一个有趣但具有挑战性的问题:如何检测基地(注释为培训)和未经看法的小型关键点给出一些注释样品?因此,我们提出了一个多功能的少量关键点检测(FSKD)管道,其可以检测不同种类的不同数量的关键点。我们的FSKD提供了预测关键点的不确定性估算。具体而言,FSKD涉及主要和辅助关键点表示学习,相似性学习和具有不确定性建模的关键点本地化以解决本地化噪声。此外,我们通过多变量高斯分布通过多变量高斯分布模拟关键点组的不确定性,以利用相邻关键点之间的隐式相关性。我们展示了我们的FSKD对(i)外文物种的新型关键点检测的有效性,(ii)少量拍摄的细粒度视觉识别(FGVR)和(III)语义对准(SA)下游任务。对于FGVR,检测到的关键点提高了分类准确性。对于SA,我们展示了一种新颖的薄板样条翘曲,它在不完美的关键点反应情况下使用估计的关键点不确定性。
translated by 谷歌翻译
Despite deep end-to-end learning methods have shown their superiority in removing non-uniform motion blur, there still exist major challenges with the current multi-scale and scale-recurrent models: 1) Deconvolution/upsampling operations in the coarse-to-fine scheme result in expensive runtime; 2) Simply increasing the model depth with finer-scale levels cannot improve the quality of deblurring. To tackle the above problems, we present a deep hierarchical multi-patch network inspired by Spatial Pyramid Matching to deal with blurry images via a fine-tocoarse hierarchical representation. To deal with the performance saturation w.r.t. depth, we propose a stacked version of our multi-patch model. Our proposed basic multi-patch model achieves the state-of-the-art performance on the Go-Pro dataset while enjoying a 40× faster runtime compared to current multi-scale methods. With 30ms to process an image at 1280×720 resolution, it is the first real-time deep motion deblurring model for 720p images at 30fps. For stacked networks, significant improvements (over 1.2dB) are achieved on the GoPro dataset by increasing the network depth. Moreover, by varying the depth of the stacked model, one can adapt the performance and runtime of the same network for different application scenarios.
translated by 谷歌翻译
Audio DeepFakes are artificially generated utterances created using deep learning methods with the main aim to fool the listeners, most of such audio is highly convincing. Their quality is sufficient to pose a serious threat in terms of security and privacy, such as the reliability of news or defamation. To prevent the threats, multiple neural networks-based methods to detect generated speech have been proposed. In this work, we cover the topic of adversarial attacks, which decrease the performance of detectors by adding superficial (difficult to spot by a human) changes to input data. Our contribution contains evaluating the robustness of 3 detection architectures against adversarial attacks in two scenarios (white-box and using transferability mechanism) and enhancing it later by the use of adversarial training performed by our novel adaptive training method.
translated by 谷歌翻译
This short report reviews the current state of the research and methodology on theoretical and practical aspects of Artificial Neural Networks (ANN). It was prepared to gather state-of-the-art knowledge needed to construct complex, hypercomplex and fuzzy neural networks. The report reflects the individual interests of the authors and, by now means, cannot be treated as a comprehensive review of the ANN discipline. Considering the fast development of this field, it is currently impossible to do a detailed review of a considerable number of pages. The report is an outcome of the Project 'The Strategic Research Partnership for the mathematical aspects of complex, hypercomplex and fuzzy neural networks' meeting at the University of Warmia and Mazury in Olsztyn, Poland, organized in September 2022.
translated by 谷歌翻译
This paper presents the Crowd Score, a novel method to assess the funniness of jokes using large language models (LLMs) as AI judges. Our method relies on inducing different personalities into the LLM and aggregating the votes of the AI judges into a single score to rate jokes. We validate the votes using an auditing technique that checks if the explanation for a particular vote is reasonable using the LLM. We tested our methodology on 52 jokes in a crowd of four AI voters with different humour types: affiliative, self-enhancing, aggressive and self-defeating. Our results show that few-shot prompting leads to better results than zero-shot for the voting question. Personality induction showed that aggressive and self-defeating voters are significantly more inclined to find more jokes funny of a set of aggressive/self-defeating jokes than the affiliative and self-enhancing voters. The Crowd Score follows the same trend as human judges by assigning higher scores to jokes that are also considered funnier by human judges. We believe that our methodology could be applied to other creative domains such as story, poetry, slogans, etc. It could both help the adoption of a flexible and accurate standard approach to compare different work in the CC community under a common metric and by minimizing human participation in assessing creative artefacts, it could accelerate the prototyping of creative artefacts and reduce the cost of hiring human participants to rate creative artefacts.
translated by 谷歌翻译
Recently proposed systems for open-domain question answering (OpenQA) require large amounts of training data to achieve state-of-the-art performance. However, data annotation is known to be time-consuming and therefore expensive to acquire. As a result, the appropriate datasets are available only for a handful of languages (mainly English and Chinese). In this work, we introduce and publicly release PolQA, the first Polish dataset for OpenQA. It consists of 7,000 questions, 87,525 manually labeled evidence passages, and a corpus of over 7,097,322 candidate passages. Each question is classified according to its formulation, type, as well as entity type of the answer. This resource allows us to evaluate the impact of different annotation choices on the performance of the QA system and propose an efficient annotation strategy that increases the passage retrieval performance by 10.55 p.p. while reducing the annotation cost by 82%.
translated by 谷歌翻译